Avoiding Communication through a Multilevel LU Factorization
نویسندگان
چکیده
Due to the evolution of massively parallel computers towards deeper levels of parallelism and memory hierarchy, and due to the exponentially increasing ratio of the time required to transfer data, either through the memory hierarchy or between different compute units, to the time required to compute floating point operations, the algorithms are confronted with two challenges. They need not only to be able to exploit multiple levels of parallelism, but also to reduce the communication between the compute units at each level of the hierarchy of parallelism and between the different levels of the memory hierarchy. In this paper we present an algorithm for performing the LU factorization of dense matrices that is suitable for computer systems with two levels of parallelism. This algorithm is able to minimize both the volume of communication and the number of messages transferred at every level of the two-level hierarchy of parallelism. We present its implementation for a cluster of multicore processors based on MPI and Pthreads. We show that this implementation leads to a better performance than routines implementing the LU factorization in well-known numerical libraries. For matrices that are tall and skinny, that is they have many more rows than columns, our algorithm outperforms the corresponding algorithm from ScaLAPACK by a factor of 4.5 on a cluster of 32 nodes, each node having two quad-core Intel Xeon EMT64 processors.
منابع مشابه
LU factorization with panel rank revealing pivoting and its communication avoiding version
We present the LU decomposition with panel rank revealing pivoting (LU PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU PRRP is more stable than Gaussian elimination with partial pivoting (GEPP), with a theoretical upper bound of the growth factor of (1+ τb) n b , where b is the size of the panel used during the block factorization, τ is a parameter...
متن کاملPerformance Predictions of Multilevel Communication Optimal LU and QR Factorizations on Hierarchical Platforms
In this paper we study the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We note that we focus on multilevel QR factorization, and give a brief description of the multilevel LU factorization. We first introduce a performance model called Hierarchical Cluster Platform (Hcp), encapsulating the characteristics ...
متن کاملCalculs pour les matrices denses : coût de communication et stabilité numérique. (Dense matrix computations : communication cost and numerical stability)
This dissertation focuses on a widely used linear algebra kernel to solve linear systems, that is the LU decomposition. Usually, to perform such a computation one uses the Gaussian elimination with partial pivoting (GEPP). The backward stability of GEPP depends on a quantity which is referred to as the growth factor, it is known that in general GEPP leads to modest element growth in practice. H...
متن کاملTHE USE OF SEMI INHERITED LU FACTORIZATION OF MATRICES IN INTERPOLATION OF DATA
The polynomial interpolation in one dimensional space R is an important method to approximate the functions. The Lagrange and Newton methods are two well known types of interpolations. In this work, we describe the semi inherited interpolation for approximating the values of a function. In this case, the interpolation matrix has the semi inherited LU factorization.
متن کاملParallel Multilevel Block ILU Preconditioning Techniques for Large Sparse Linear Systems
We present a class of parallel preconditioning strategies built on a multilevel block incomplete LU (ILU) factorization technique to solve large sparse linear systems on distributed memory parallel computers. The preconditioners are constructed by using the concept of block independent sets. Two algorithms for constructing block independent sets of a distributed sparse matrix are proposed. We c...
متن کامل